{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Beta-NMF for GPGPU\n",
    "In this example the Beta-NMF is used to train dictionnary on a dataset X and project some test data Y on the dictionnary trained on X.\n",
    "\n",
    "### Links\n",
    "Source code is available at https://github.com/rserizel/beta_nmf\n",
    "\n",
    "Dcoumentation is available at http://rserizel.github.io/beta_nmf/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import packages\n",
    "If you plan on using GPGPU make sure you set the Theano flags correctly (see [Theano documentation]).\n",
    "\n",
    "[Theano documentation]: http://deeplearning.net/software/theano/ "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from beta_nmf import BetaNMF\n",
    "from base import nnrandn\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training data\n",
    "Generate random nonnegative data to decompose. At this stage you can decide to load your own data.\n",
    "\n",
    "Note that in order to notice the impact of using a GPGPU you should use much larger matrices. For a more extensive study of the advantage of using GPGPU for NMF please refer to:\n",
    "\n",
    "> R. Serizel, S. Essid, and G. Richard. “Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence”. Accepted for publication In *Proc. of MLSP*, p. 5, 2016."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "X = nnrandn((5000, 200))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the BetaNMF\n",
    "Create a BetaNMF object. Theano variable are initialised based on the shape of the data X."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nmf = BetaNMF(X.shape, n_iter=10, verbose=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the model\n",
    "Decompose the data X into two nonnegative matrices W and H. Perform 10 iterations and compute the score for each iteration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nmf.fit(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use the model\n",
    "Generate test data Y and project Y on the dictionnary W learnt on X."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "Y = nnrandn((500, 200))\n",
    "H = nmf.transform(Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use custom init values\n",
    "You can set W and H to custom values with Theano function `set_value()`. Then you should you warm start mode to avoid aoverwritting these values().\n",
    "\n",
    "For example, below we set H to a custom value and start projection on W with this value as intial factor H."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nmf.h.set_value(np.ones(H.shape))\n",
    "H = nmf.transform(Y, warm_start=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Note on the factors shape\n",
    "In cold start mode the factors are initialised randomly to the correct shape. When using warm start, you should make sure that the shape of the factors is consistent with the data shape. See examples below.\n",
    "### This will work\n",
    "Resume projection of Y at iteration 10: data shapes are consistent.\n",
    "\n",
    "Note also that the initial cost is equal to the final cost in the previous cell which is the expected behaviour here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "H = nmf.transform(Y, warm_start=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### This will not work\n",
    "Project X starting from a projection of Y: data shapes are not consistent X.shape = (5000,200) and H.shape = (500, 50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "H = nmf.transform(X, warm_start=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
